Skip to content

Conversation

@sunjiweiswift
Copy link

@sunjiweiswift sunjiweiswift commented Sep 10, 2025

image

Missing features

  1. GQA optimization, allowing multiple Qs and a single K to be computed in the same GEMM
    2. Slide window -- done

@sunjiweiswift
Copy link
Author

@rolandschulz @tdeng5 @jiyang1011 please review

@tdeng5 tdeng5 self-requested a review September 10, 2025 09:23
@sunjiweiswift sunjiweiswift force-pushed the flash_chunk_prefill branch 8 times, most recently from 0505aed to 8c5d3ce Compare September 17, 2025 02:48
@sunjiweiswift
Copy link
Author

@rolandschulz @tdeng5 @jiyang1011 please review

@sunjiweiswift
Copy link
Author

@rolandschulz pls review

@sunjiweiswift
Copy link
Author

@Antonyvance pls review again~

@Antonyvance
Copy link

@sunjiweiswift I believe this need to be reimplemented based on this PR 547. Would you be able to adopt?

@sunjiweiswift
Copy link
Author

@sunjiweiswift I believe this need to be reimplemented based on this PR 547. Would you be able to adopt?

Can I merge it first? The sglang-xpu already uses this kernel. However, thirdparty is currently my forked repo, so I can't use the public repo. The new API will be available after the 547 merge. I will adapt and modify it in the new PR.

@sunjiweiswift sunjiweiswift force-pushed the flash_chunk_prefill branch 2 times, most recently from 537afe5 to 8809894 Compare October 22, 2025 08:58
sunjiweiswift and others added 10 commits November 3, 2025 10:19
This change imports `SYCLCompat` to cutlass-sycl repo as `compat`.
Previous dependencies on `syclcompat` are changed to `compat`.
This PR also fix some failures of `SYCLCompat` in oneapi 2025.2.

---------

Co-authored-by: Roland Schulz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

redesign required Implementation require a redesign

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants